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Abstract 

This paper considers the problem of reliable communication over discrete-time channels whose 
impulse responses have length L and exactly S < L non-zero coefficients, and whose support 
and coefficients remain fixed over blocks of N > L channel uses but change independently from 
block to block. Here, it is assumed that the channel's support and coefficient realizations are both 
unknown, although their statistics are known. Assuming Gaussian non-zero-coefficients and noise, 
and focusing on the high-SNR regime, it is first shown that the ergodic noncoherent channel capacity 
has pre-log factor 1 — for any L. It is then shown that, to communicate with arbitrarily small 
error probability at rates in accordance with the capacity pre-log factor, it suffices to use pilot-aided 
orthogonal frequency-division multiplexing (OFDM) with S pilots per fading block, in conjunction 
with an appropriate noncoherent decoder. Since the achievability result is proven using a noncoherent 
decoder whose complexity grows exponentially in the number of fading blocks K, a simpler decoder, 
based on S+ 1 pilots, is also proposed. Its e-achievable rate is shown to have pre-log factor equal to 
1 — with the previously considered channel, while its achievable rate is shown to have pre-log 
factor 1 — =T&i when the support of the block-fading channel remains fixed over time. 
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I. Introduction 

We consider the problem of communicating reliably over an unknown sparse single-input single- 
output (SISO) frequency-selective block-fading channel that is described by the discrete-time complex- 
baseband input/output model 

L-i 

y( k )[n] = ^pJ2h^[l}x^[n-l]+v (k) [n], (1) 
1=0 

where n G {0, . . . ,N — 1} is the channel-use index, k £ {1, . . . ,K} in the fading-block index, 
a^ fc )[n] is the transmitted signal, y^[n] is the received signal, and ?/ fc )[n] is additive white Gaussian 
noise (AWGN). Throughout, it will be assumed that the channel length L obeys L < N. The channel 
is "sparse" in the sense that exactly S of the L channel taps {h^[l]}^~Q are non-zero during each 
fading block k, where the indices of these non-zero taps, collected in the set C^ k \ can change with 
fading block k. We will refer to this channel as "strictly sparse" when S < L, and as "non-sparse" 
when S = L. Furthermore, the channel is "unknown" in the sense that the transmitter and receiver do 
not know the channel realizations, although they do know the channel statistics, which are described 
as follows. 

Recalling that there are M = (^) distinct 5-element subsets of {0, ... ,L — 1}, we write this 
collection of subsets as {£j}f£ 1 . We then assume that the channel support is drawn so that 
the event = Li occurs with prior probability \, where is drawn independently of 
for k' k. We also assume that the vector h$ £ C s containing the non-zero taps {/i( fc )[Z] : 
I G has the circular Gaussian distributior I h^z ~ CJ\f(0, S _1 i"), with h^) independent of 

h^ z ' for k! / k. Finally, we assume that u( fc )[n] ~ C7V(0, 1) with i/ fc )[ra] independent of v^ k '\n'} 
for (k',n') / (k,n). We impose the power constraint ^ n = 1 E{|x^^[n]| 2 } = 1 Vfc, so that the 
signal-to-noise ratio (SNR) becomes p in (Q}. 

Our channel model is motivated by the results of recent channel-sounding experiments (e.g., ifTTl- 
j3l ) which suggest that, as the communication bandwidth increases, the channel taps {h^[n]}^ZQ 
become sparse in that the majority of them are "below the noise floor" (31 p. 2]. The same 
behavior can be seen to manifest |]5l in channel taps sampled from IEEE 802.15.4a (H "ultra 

1 For ease of presentation, we assume that all non-zero channel taps have equal variance. All of our results except 
Lemma ffj and Corollary Q] remain valid for any positive definite covariance matrix of h^, and both Lemma Q] and 
Corollary [TJ can be straightforwardly extended to the general case. 
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wideband" propagation-path-based continuous-time impulse responses after square-root-raised-cosine 
pulse shapingo Clearly, the fact that we use exactly zero-valued taps makes our channel model an 
approximation, albeit a standard one (see, e.g., @ p. 5]). In fact, our channel model ignores many 
additional featured of real-world channels in order to facilitate an information-theoretic analysis. In 
addition, it should be emphasized that we assume a channel with exactly S non-zero taps, as opposed 
to at most S non-zero taps, and a decoder that knows the channel statistics perfectly (including S, 
L, {Xi}f =1 , and p). 

Notation: Above and in the sequel, we use lowercase boldface quantities to denote vectors, 
uppercase boldface quantities to denote matrices, and we use i" to denote the identity matrix. 
Also, we use (-) T to denote transpose, (•)* conjugate, (-) H conjugate transpose, (-) + pseudo-inverse, 
and T>(b) the diagonal matrix created from vector b. Furthermore, element-wise multiplication, 
1 1 a; 1 1 = V x H x, and \\x\\a — V x H Ax for Hermitian positive semi-definite A. Throughout, "log" 
denotes the base-2 logarithm. For random variables, we use E{-} to denote expectation, cov{6} 
auto-covariance, h(a) entropy, and I(a, b) the mutual information between a and b. Finally, we 
write CJ\f(x; fj,, S) = (7^ det(X!)) _1 exp(— \\x — for the circular Gaussian pdf with 

mean /x £ and positive definite covariance matrix S, and we write x ~ CJ\f{n, S) to indicate 
that random vector x has this pdf. In Table U we list commonly used quantities, along with their 
definitions. 



A. Preliminaries 

Throughout the paper, we assume that the prefix samples are chosen as a cyclic 

prefix (CP), i.e., [—1] = [N — I] for I = 1, . . . , L— 1. In this case, we can write the k th block 
observations = (y {k) [0], . . . ,y (k) [N - 1]) T as 

= ^pX^h^+v^, (2) 

2 Say that /i,' fe '(f) = Ysq=i a q e ^ q &{~t — T q) is a continuous-time impulse response based on Q propagation paths. When 
the pulse shape bt(t) is used at the transmitter and b r (t) is used at the receiver, and the baud interval is T, the channel 
taps become h^[l] — (fy * hS^ * b r )(lT), where * denotes convolution. For a detailed derivation, see, e.g., (5). 

3 For example, in practice, the active taps {h^[l]} leC (k) and additive noise might be non-Gaussian and/or correlated 
within a fading block; the active taps, support, and noise might be statistically dependent and/or non- stationary across 
fading blocks; and the linear channel assumption might break down due to power-amplifier non-linearities. 
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where 4 («(*)[0], . . . ,«W[JV - 1]) T , 4 (fc(*)[0], . . . , - 1],0,...,0) T G C^, and 

x (fe) G C 7Vx7V is the circulant matrix W jth first column = (x^[0\, . . . ,x^[N - 1]) T . An 
equivalent model resultj^l from converting all signals into the frequency domain: 

y\ k) = ^pV{ X \ k) )h\ k) + v\ k \ (3) 

where y\ k) = Fy^ k \ x\ k) = Fx( k \ v\ k) = Fv^ k \ h (k) = y/NFh^>, and where F denotes the 
TV-dimensional unitary discrete Fourier transform (DFT) matrix. Noting that i? f ~ CJ\f(0,I), the 
model (0) establishes that, when viewed in the frequency domain, the frequency-selective channel © 
reduces to a set of N non-interfering scalar subchannels with averagdj subchannel SNR p. Although 
the subchannels are non-interfering, the subchannel gains within the k th block (i.e., the elements of 
the vector h\ [ ) are correlated in a way that depends on the channel support C^ k \ as will be detailed 
in the sequel. For capacity analysis, we assume that the number of fading blocks K is arbitrarily 
large, and we ignore overhead due to the prefix, consistent with Q, JSJ. Some implications of this 
choice are discussed below. 

B. Existing Results on Noncoherent Channel Capacity 

Much is known about the fundamental limits of reliable communication over the unknown non- 
sparse channel in the high-SNR regime (i.e., p — > oo). For example, assuming that communication 
occurs over an arbitrarily large number of fading blocks K, the ergodic capacity C non - S p arse (p), in 
bits per channel use, obeys Q, JH 

Hm Cnoi>sparse(p) = (4) 

p->oo log p N 

In other words, the "multiplexing gain" (9[ of the non-sparse channel (i.e., the pre-log factor in its 
ergodic capacity expression) equals 1 — -4. Furthermore, it is possible to achieve this multiplexing 
gain using pilot aided transmission (PAT), which uses L signal-space dimensions of each fading 
block to transmit a known pilot signal and the remaining N — L dimensions to transmit the data 
0, JH. In the sequel, we use the term "spectrally efficient" to describe a communication scheme 
whose achievable rate expression has a pre-log factor matching that of the channel's ergodic capacity 
expression (i.e., the channel's multiplexing gain). 

4 Model 10 follows directly from (TJ) using the fact that X (k) = F H V(yfN Fx {k) )F . 

5 The average subchannel SNR of p follows from the fact that -h E{\\h^\\ 2 } — 1. 
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C. Our Contributions 

In this paper, we study the fundamental limits of reliable communication over the unknown sparse 
channel ([]]) in the high-SNR regime. First, we show that the ergodic capacity C S p arse (p) obeys 



for any sparsity S such that 1 < S < L < N. Comparing © to (0]), it is interesting to notice 
that the channel's multiplexing gain depends on the number of non-zero taps S and not the channel 
length L, even though the locations of these taps C = . . . , £0*0) are unknown. Second, 

we show that the sparse frequency-selective block-fading channel admits spectrally efficient PAT, 
just as its non-sparse variant does. In other words, for an 5-sparse channel, one can construct a 
PAT scheme that uses only S pilots per fading block to attain an achievable rate that grows with 
SNR p at the maximum possible rate, regardless of the channel length L. We establish this result 
constructively, by specifying a particular OFDM-based PAT scheme and a corresponding decoder, 
which — as we will see — can be interpreted as a joint channel-support/data decoder. Because our 
decoder is computationally demanding (e.g., it requires the evaluation of up to \C\ = M K = 0{L SK ) 
support hypotheses), we also consider a much simpler PAT decoder and find that its e-achievable-rate 
has a pre-log factor of 1 — ^jf-, for any error-rate e > 0. 

In stating the above pre-log factors, we emphasize that the overhead due to the OFDM prefix has 
been ignored (for consistency with Q, (H). If, instead, the overhead was included, then the pre-log 
factor of the non-sparse channel's ergodic capacity © would read as jj^j^tj, and that for the sparse 
channel ® would read as Af A ^ L s 1 . Although the increase in pre-log factor resulting from channel 
sparsity, i.e., N L ^T I f_ 1 , is not as pronounced as when the prefix is ignored, i.e., ^j^-, the two values 
are very similar when N S> L — 1, which is the typical case in practice. 

D. Relation to Compressed Channel Sensing 

The problem of communicating over sparse channels has recently gained a significant amount of at- 
tention through the framework of compressed channel sensing (CCS), as seen by the recent overview 
article Hi and the long list of papers cited therein. In CCS, it is assumed that pilots are embedded 
during transmission, and that channel estimation is performed using pilot-only observations (i.e., 
without the aid or interference from data). CCS then exploits channel sparsity to reduce the number 
of pilots needed for accurate channel estimation, in the hopes of increasing spectral efficiency. As 
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an example, for the iV-subcarrier OFDM scenario described by (O, CCS results (H show that, when 
P = C(5 max In 5 N) pilot subcarriers are chosen uniformly at random, any deterministic L-length 
channel h> ' with sparsity at most 5 max yields a CCS estimate h ccs such that 

ll^ccs ~~ II 2 — (7 *^ maxJ ^ n — with high probability, (6) 

where C is a constant. The success probability in © grows with L and N, but not with SNR p (see 
AH for details). Furthermore, in the special case that the observations are noise-free, it is known that 
exactly 25 data-free observations are both necessary and sufficient for perfect recovery ifTOl . 

In comparing the CCS approach to the approach that we have taken, we notice that the two 
are fundamentally different. For example, CCS yields guarantees on the performance of channel 
estimation, but not on the rate of reliable communication. Also, CCS attacks the channel estimation 
problem using a "non-random parameter estimation" framework, whereas we approach channel 
estimation using a "random parameter estimation" framework, since we consider ergodic capacity 
and achievable rate, and are thus interested in average channel estimation performance. A potential 
weakness to the CCS approach is that it uses only pilot observations for channel estimation, even 
though the data-dependent observations contain valuable information about the unknown channel; 
our work (and related empirical results in flU, iTTTTl . lPT2"T0 suggests that significant gains can result 
from the use of joint channel-estimation and data decoding. Strengths of CCS include the facts that i) 
CCS focuses on reconstruction techniques that have polynomial complexity in L and S mSLX ; ii) CCS 
focuses on reconstruction techniques that do not need to know the distributions of the signal and 
noise; iii) CCS guarantees like ((6]), which hold for any sparsity S < 5 max , can be further extended 
to cover the case of approximately sparse (i.e., "compressible") signals [4, p. 5]. 

II. Noncoherent Capacity 

In this section, we characterize the ergodic noncoherent capacity of the sparse frequency-selective 
block-fading channel described in Section J] We focus on the high-SNR regime, i.e., p — > oo. 

Theorem 1. The ergodic noncoherent capacity of the sparse frequency-selective block-fading chan- 
nel, C S p arS e(p), in bits per channel use, obeys lim^oo ^[^^ = 1 — for sparsity S and block 
length N, whether or not the channel support realization C is known apriori. 
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Proof: Using the chain rule for mutual information ff3l . it follows straightforwardly that 

I(y(*) ;a j(*)) = I(y( fc );£( fe ))+I(y( fc );a ; ( fc )|£( fc ))-I(y( fe );£( fc )|a ; W). (7) 

where I(a; b) denotes the mutual information between random vectors a and b and where I(a; b \ c) 
denotes the conditional mutual information between a and b conditioned on c. Then, since \C^\ = 
M, we can bound the first term in (|7]) as follows: 

I(y(*) ; £(*)) < h(£W) < log|£W| = log Af, (8) 

where h(a) denotes the entropy of a. Because I(j/ fe ); \ x^) > 0, I©-® yield the upper 
bound l(y^;x^) < log M + I(y(*); as^ | Similarly, since I(yW;£( fc )) > 0, equation © 

imphes that I(y( fe );a;W) > I(j/W;a;( fc ) | - I(</ fc );£( fc ) | x^) and, since l(y^;C^ | x^) < 
h(£( fc ) | a;( fe )) < logM, we also have that l(y^;x^) > l(y^;x^ | £(*)) - logM. In summary, 
we have that 

I(y(*) ;aj (*>) = I(y( fc ) ;£ c( fc ) |£( fc )) + A for A £ [ — log M, log M ] . (9) 

Given knowledge of the support C^ k \ the frequency-domain vector is zero-mean Gaussian 
with a rank-5" covariance matrix. Thus, (U Theorem 1] implies that Cc(p), the pre-log factor of 
ergodic noncoherent capacity under knowledge of the support C equals 1 — i.e., lirrip^oo = 
1- Since 

C c (p) = max I I(y« ;a >>| £<*>), (10) 

p(K f (fc) ):E||a3 f <fc) || 2 <JV ^ V 

where I(j/ f ;ac f | £( fc )) = l(y^;x^ \ C^) and where, due to ©, I(?/ fc ); a;( fc ) | C^) differs from 
l(y^;x^) by a bounded p-invariant constant A, the ergodic noncoherent capacity 

CsparseGo) = max l^yW;^)), (H) 

p(a;( fc )):E||a;( fe )|| 2 <Ar N 

must also obey lim^ = 1 - f . ■ 

It is interesting to notice that the channel multiplexing gain equals 1 — whether or not the 
support C is apriori known. 

III. Pilot Aided Transmission and Decoding 

For the non-sparse frequency-selective block-fading channel, it has been shown [7] that pilot aided 
transmission (PAT) is spectrally efficient as defined in Section H i.e., that it is possible to design a 
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PAT scheme for which the pre-log factor in its achievable rate expression coincides with the pre- 
log factor in the noncoherent ergodic capacity expression (i.e., the channel multiplexing gain). The 
question remains as to whether PAT is spectrally efficient for sparse channels as well. 

Interestingly, Theorem Q] showed that the multiplexing gain of the sparse channel does not change 
with knowledge of the channel support C Realizing^ that an S'-sparse channel with known support 
has the same capacity characteristics as a non-sparse length-5 channel, and recalling that PAT is 
spectrally efficient for non-sparse channels, one might suspect that PAT is spectrally efficient for 
sparse channels. As we shall see, this is indeed the case. To prove this, we construct an appropriate 
PAT scheme and a corresponding decoder, as detailed in the following subsections. 

A. PAT Definition 

For the transmission scheme outlined in Section II-AL we consider a PAT scheme in which the 
elements in the frequency-domain transmission vector x^ G can be partitioned into a pilot 
vector x p G C p , created from {a; f [n] : n G A/" p }, and a data vector x^' G C N ~ P , created from 
{xf\n] : n G A/" d }. Here, we use J\f p C {0, . . . , N— 1} to denote the pilot subcarrier indices and N<j 
to denote the corresponding data subcarrier indices, where A/a = {0, . . . , N — 1} \ Ap. Notice that 
exactly P signal-space dimensions (per fading block) have been allocated to pilots, i.e., |A/p| = P- 
For simplicity, we assume that the pilot locations Ap and pilot values x p do not change with the 
fading block k, and that the pilot values are constant modulus, i.e., |a: p [n] | = 1. By definition, the 
pilot quantities x p and J\f p are known apriori to the decoder. 

In the parallel subchannel model (O, we partition both yi k ' G and Vf G in the same 
way as we did x^ G C N , yielding 

y { p k) = JpV{x p )J p hf ] +v ( p k) (12) 

yf = ^pV{xf)J 6 h^ + vf, (13) 

where J p is a selection matrix constructed from rows J\f p of the N x N identity matrix, and 

(k) (k) 

is constructed similarly from rows Afa of the identity matrix. Another way to write y p and y 6 ', 

6 The equivalence in pre-log factor between ^-sparse channel with known support and a non-sparse length-S channel 
follows directly from [8] Theorem 1] and the fact that, in both cases, is zero-mean Gaussian with rank-5* covariance 
matrix. 
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which will be useful in the sequel, is 

y { p k) = y/^V(x p )F p % e h$+v { p k) (14) 
y?> = ^V{x^)F$lj$ + vf, (15) 

where h$ G C s is formed from the non-zero elements of h^ k \ F^ we is formed from rows M p 
and columns of the DFT matrix F, and 2^*2,... is formed from rows A/"d and columns of 



2*\ Notice that, because is not apriori known to the decoder, neither are F^ rue or ^tnje - 

To achieve an arbitrarily small probability of decoding error, we construct codewords that span 
K blocks, where K is arbitrarily large. Thus, using £ C C K( - N ~ P ^ to denote our codebook, we 
partition each codeword x^ G £ into K data vectors, i.e., x$ = [x^ T , . . . ,x^^ J ] J , for use in our 
PAT scheme. The codewords x^ are generated independently from a Gaussian distribution such that 

(k) (k) 

the x d has positive definite covariance matrix for all k, and such that x d is independent of 

(k') 

x d ' for k k . Denoting the number of codewords in the codebook by \<£\, the average data rate 
is given by K = log 

B. Optimal Decoding for PAT 

The reader may naturally wonder: what is the optimal decoder for the above PAT scheme in the 
case of the sparse channel described in Section H and how does it compare to optimal decoding 
in the non-sparse case? To answer these questions, we detail the optimal decoder for the sparse 
and non-sparse cases below. In the sequel, we use Fi G C ArxS to denote the matrix formed from 
columns Li of the DFT matrix F, we use F P j G C PxS to denote the matrix formed from rows A/'p 
of Fi, and we use F^ G <C ( - N ~ ps ) xS to denote the matrix formed from rows A/"d of Fi. 

Lemma 1. The maximum likelihood decoder for PAT over the S-sparse L-length frequency-selective 

N -block-fading channel takes the form 

K M _ x 

x™ L = argmaxfJE^ det (pJVi^P^ Q a;f *)F d)i + S"^) 
Xae€ k=i i=i 

( ii ( fc ) r~Kr<r\< W\ P £( fc ) / (fc)\ii 2 wiS^ i ;( k ) i|2 \ 



nz,p. 



where A p ^ = Pr{£( fe ) = Ci\y^\x p } is the pilot-aided channel-support posterior, where fi n ^ p ^ 
is the Li-conditional pilot-aided MMSE estimate of h$ and S nz n % is its error covariance, which 
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take the form 

knz!p,i = ^F^(pF 9 ,Fl l + py l V{xl)yf\ (17) 
S nz ,p,i = \{l - Fpi (F p ,iFp ti + ^ J) -Fp.i), (18) 

an J where h^i ) denotes the MMSE estimate of h$ conditioned on the data hypothesis 
and based on the pilot-aided channel statistics HJ 7i-HJ 8[, i.e., 

h { n %( X ( t) = *2!ft< + v^nz.p^^^*)^^ + J)^ 1 

*{yf-^V{xf)F 6 ,tiS^). (19) 

Proof: See Appendix lAl ■ 
Paraphrasing Lemma [Q the optimal decoder (fTol ) for sparse-channel PAT first uses pilots to 
compute support posteriors {Ai^}^ and support-conditional channel posteriori] {^lz^p,i}^i f° r 
each fading block k. Then, it averages over the M support hypotheses to obtain a joint data-channel 
decoding metric for each fading block k. Finally, it searches for the codeword that maximizes the 
product of the decoding metrics (over all fading blocks k). We note that optimal decoding is an 
example of Bayes model averaging 11141 and differs markedly from the decoding approach implied 
in the compressed channel sensing (CCS) framework (H, which aims to compute a single sparse 
channel estimate {h^p^, = Ci] for later use in data decoding. We also note that ML decoding 
complexity iff] <D(\£\MKN 3 ). 

For illustrative purposes, we compare the optimal decoder for a sparse channel (as specified in 
Lemma Q] above) to the optimal decoder for a non-sparse channel, as detailed below in Corollary Q] 

Corollary 1. The maximum likelihood decoder for PAT over the non-sparse L-length frequency- 
selective N -block-fading channel takes the form 

K 

x^ L = arg min V ( In det (pNF% diag(a;< fc) xf* )F 6 + E"* ) 

+ ||y« - Jp~NV{ X f )F d h < S( x P)\\ 2 + H^^W) - tfz!pf Sn -J, (20) 

7 Note that {Snz,p,i}i=i can be precomputed since they do not depend on the observations. 

8 The term after the sum in d!6t must be computed for every triple (i, k, x^), where the complexity of each computation 
is 0(N 3 ) due to the matrix inversion in dl9b . 
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* CO (k) 

where h nz p is the pilot-aided MMSE estimate of h\ z and S nz p is its error covariance, which take 
the form 

hZ ± y^F^pF^ + py'v^yf, (21) 
S nz ,p ^ i(/-F p H (F p F p H + ^/)^F p ), (22) 
and where (x^) denotes the MMSE estimate of h^ z conditioned on the data hypothesis x^ 



and based on the pilot-aided channel statistics A21\l - A22\l . i.e., 

h^ z \xf) = h ( *l + y^VSnz, pj P d H V(x^*)(pNV(x^)F^ nz , p F H d V{xf*) + l)" 1 

{yf-^NV{x^)F^l). (23) 



x 



To paraphrase Corollary [Q the optimal decoder (fl6l ) for non-sparse-channel PAT computes a 
single pilot-aided MMSE channel estimate h nz p , which is then used to construct a joint data-channel 
decoding metric, for each fading block k. Finally, it searches for the codeword that minimizes the 
sum of the decoding metrics (over k). It can be seen that optimal decoding in the sparse case differs 
from that in the non-sparse cases by the need to compute, at each fading block k, the support 
posteriors {A^}^^ and the corresponding support-conditional tap estimates {hm,p,i}fLi an< l men 
average the decoding metrics over the M support hypotheses. 



C. Decoupled Decoding of PAT 

For both sparse and non-sparse channels, the optimal decoder of PAT, as detailed in Section ITlI-B I 
takes the form of a joint-channel/data decoder. In practice, for reasons of simplicity, decoding is often 
decoupled into two stages: i) pilot-aided channel estimation and ii) coherent data-decoding based on 
the channel estimate. We now detail a decoupled decoder for the sparse channel of Section Hand the 
PAT scheme of Section [ni] that, while suboptimal, performs well enough to yield spectrally efficient 
communication when provided with the correct value of the channel support C In the sequel, we 
will refer to the case of known C as the support-genie case. Later, in Sections IIV-AI and IIV-B[ we 
will propose schemes to reliably infer the support C 

For our decoupled decoder, pilot-aided channel estimation is accomplished in a support-hypothesized 
manner. More precisely, we compute — at each fading block k — the pilot-aided MMSE estimate 

h\ Pt i k of the non-zero taps h\ k ^ under channel-support hypothesis = Ci k . To do this, we set 

A n.\ „ * (A;) - (fe) 

\p,i k = VjV-F 1 j fe ^nz,p,i fc f° r tne ^nzp,i fc specified by (fTTT ). Note that hf pik is a linear estimate due 
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(k) 

to the fact that /i f becomes Gaussian when conditioned on a particular support. In contrast, the 
(support-unconditional) pilot-aided MMSE estimate of h f is in general non-linear. The support- 
hypothesized channel estimates {^fp^ fc }fLi an d their covariances {5]f,p,j fc }j[Li are tnen use d i n 
coherent data decoding. (Note that Sf Ptik = N F i k Yi nz ^i k F^ , where ^ n z,p,i k is given by (fT8l)). For 
coherent data decoding, we employ the weighted minimum-distance (WMD) decoder, defined |[T5l 
as 



4JT = argmin^HQf (y?> - VW*<VdCjf. (24) 



fc=l 



where is a weighting matrix and i = (ii, . . . , i^). Writing the observation as 



A _(*) 

the standard 1151 choice for Q\ is a whitening matrix for the "effective noise" ■ . We note that 
the covariance C$^ k = cov{e^ } (and thus Q£ ') depends on 5]f Piifc , R^, and p. 

For the achievable rate of the decoupled-decoder PAT system to grow logarithmically with p, the 

(k) 

effective noise ^ must satisfy certain properties. Towards this aim, we establish that, with P > S 
pilot tones, the support hypothesized channel estimation error variance decays at the rate of - p as 
p — > oo, if and only if the support hypothesis is correct. 

Lemma 2. Say that N is prime. Then, for any pilot pattern A/p such that P > S, there exists a 
constant C such that the channel estimation error obeys E{||/fyp^|| 2 } < Cp^ 1 for all p > if and 
only if C{ = C^j e , i.e., Ci is the true channel-support of k th block. 

Proof: We begin by recalling that, under support hypothesis C^ = Ci, the frequency-domain 
channel coefficients hi ^ are related to the non-zero channel taps h^ z via hf ] = y/NFih$$, where 
Fi contains columns Ci of the unitary DFT matrix F. Thus, h fp i , the Ci -conditional pilot-aided 
MMSE estimate of is related to ^nz,p,j' tne £-i -conditional MMSE pilot-aided estimate of h$, 
via hj p j = y/NFih^ Z ni. Because the columns of Fi are orthonormal, the estimation error obeys 

ll'Hp.ill — ll n f ft f,p,ill — iv ll"Tiz "nz,p,ill — iv ll n nz,p,ill ■ \ m > 
Plugging (fl4l into (fTTT ). the estimation error h nzpi = h nz — h nz p i becomes 

"-nz,p,« — p,i V p,*- 1 ^ p,i ~r pTV / p.truey ' 4 nz 
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Then, since is independent of v p k \ 

x (j ~ F p,i{ F P,i F p,i + JF? 1 ) ^p^rue) } 

+ M { F U F ^ F ",i + W 1 )" 2 ^}- (28) 

(k) 

We now make a few observations about F p i and -f^ptrue - When N is prime, the Chebotarev theorem 



[ 16], [ 17] guarantees that any square submatrix of the A r -DFT matrix F will be full rank. Hence, any 
tall submatrix of F will also be full rank. Then, because P > S, it follows that F p;l G C PxS will 
be full rank for all i, as will F p t f Ue . Furthermore, when £j 7^ Aoie' ^ follows that F pi 7^ Fp'true- 
To proceed, we use the singular value decomposition F P: i = C/jEjVy, where Ej G C PxS is a 
full-rank diagonal matrix and where V \ and Vj are both unitary. Then 

Kii^Ki + W 1 )' 1 = ViE^EiE^ + ^l)- 1 ^, (29) 

v , ' 

where Di G C PxS is full-rank diagonal with non-zero elements "sj(pN) }f=v usrn § t0 
denote the I th singular value in Ej. 

In the case that £j = C^ e , we have F^ rue = F pi , and so 

+ -^tr{F 4 I^A^ H } (30) 
= |tr{(/-^E,)(J-E^A)} + ^rtr{^A} (31) 



if-fl °h Y + —T ^ (32) 



5 ""'E^r- 04) 



s 

I, 

1=1 i,l 

Thus, we have the upper bound E{||/i f ( ^|| 2 } = iV E{||/i^p,ill 2 } < Cp~ l with C = £f =1 

For the case d / J~-[^ e , we have F^ ue / Fp,i, and so we can use the previously defined SVD 
quantities to write F^ we = t/j(Ej + A.i)Vf , where Aj G C Px5 is some non-zero matrix. It then 
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follows that 

E{||CmII 2 } = btrUl-ViDVPi + AjVW-Vipi + AiFDiV?)} 

+ - j fr T tT{V i D?D i V?} (35) 
= 1 tr {(I - DVHi - D?A<)(I - E$*A - A?A)} 

+ ^tr{A H A} (36) 
= E{||^ z !p,truel| 2 } - | tr {(I - ^EiJAj 1 A + A H Ai(I - E, H A)} 

+ ^tr{A"AiA?A} (37) 
As established above, E{||/i,^ptruell 2 } — > as p — > oo. Since i" — -D^Sj is diagonal with elements 
i i+pNo-' 1 }f=i> tne secon d term i n (t37l ) also vanishes as p — > oo. The third term in (f3VT >. however, 
converges to the quantity ^ tr{S^ Aj A^E,* H )} as p — >■ oo, where (-) + denotes pseudo-inverse. 
Now, since i*^^ and F p ^ are distinct full rank matrices with ^{F p k ^ e F p k ^ ue } = ^{Fp^Fp^}, 
it follows that E+Aj 7^ and hence tr{E+AiA 4 H E+ H )} > 0. So there does not exist C such that 
E{||^z!p, 4 H 2 }<Cp- 1 forallp>0. ■ 

Corollary 2. Lemma [2] an J several other results in the paper, are stated under prime N, arbitrary 
A/p, and L < N. The requirement that N is prime can be relaxed in exchange for the following 
restrictions on A/" p and L. 

1) The set A/p does not form a group with respect to modulo-N addition, nor a coset of a subgroup 
of {0, 1, . . . , N — 1} under modulo-N addition. 

2) The channel length L obeys L < N/2. 

Proof: Throughout the paper, the prime-iV property is used only to guarantee that certain square 
submatrices of the iV-DFT matrix F remain full rank. When forming these submatrices, we use 
S row indices from A/p (where A/p C {0, . . . , N — 1} and \J\f p \ = P > S) and S column indices 
from £j (where d C {0, . . . , L — 1} and \d\ = S). In the case that N is prime, the Chebotarev 
theorem ifToll . ifTTl guarantees that our square submatrix will be full rank, as discussed in the proof 
of Lemma [2] However, even when N is not prime, our square submatrix will be full rank whenever 
both A/p and d do not form groups with respect to modulo- N addition, nor cosets of subgroups of 
{0, 1, . . . , N — 1} w.r.t modulo- N addition flTOl p. 491]. These conditions on A/p and d are ensured 
by the two conditions stated in the corollary. ■ 
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For a given communication scheme, we say that a rate 1Z (in bits per channel use) is achievable if 
the probability of decoding error can be made arbitrarily small at that rate. Now, using the bound on 
the estimation error variance from Lemma[2j we establish that when the true channel support is apriori 
known at receiver (i.e., the support-genie case), the achievable rates satisfy lim^oo = 1 — ^, 
where P > S denotes the number of pilot tones. 

Lemma 3. Say that N is prime, and that the true channel support is known apriori at the receiver 
for each fading block. Then, for any pilot pattern A/p such that P > S, the achievable rate of the 
support-hypothesized estimator-decoder satisfies lim^oo = 1 — -j^. 

Proof: The achievable rate of WMD decoding under imperfect channel state information (CSI) 
and Gaussian coding was studied in lfT5l . where the rate expressions were obtained under certain 
restrictions on the statistical properties of the imperfect CSI. In the support-genie case, our support- 
hypothesized channel estimator satisfies all of the standard requirements in lfT5l except for time- 
invariance, since the support varies over the fading blocks. However, our model does satisfy the 
alternative ergodic condition in lfT5l . To see this, we need to verify that, for any function /(•), we 

have lim^^oo i^f=i/(l/d fc) ^f ( ,pk,,rue) = E {/(VdM ^f.MwnJ}' usin § *fc,tnie to denote the index 
of the true support during the k th fading block, and /&f P ^ fctrus — ">/NFi k true fi^n, true- Let us define 
Ki = {k : £ tr * e = Ci} for £ = 1, ... , M. Then it follows that, 

K M 

k £ = Jc^T, f^rtZ), 08) 

k=i i=i held 

i=i 11 fce/c, 

M 

= E^ E {/(^^Si)| £ t ( 5e=A}, (40) 

i=i 

= E{/(y( fc U£ fctru J}. (41) 

Hence lfT5l Theorem 2] can be applied to find the achievable rates for our decoupled decoding 
scheme under the support genie. In particular, by rewriting the data observations from d25l ) as 

= VpM k) )jKL ue + 4l^ (42) 

for effective noise ^} kVue — v^^(^d^)^ d ^f,p!ifc true + v< d^ ^ f ouows lT3~3ll that the achievable rate 
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(in bits per channel use) is 

K{p) = I E { log det [i + p C 6 l xme (p) V ( Jdfcgfc.J # d D ( Jd&$ iM J "] } > < 4 3) 

where C d)i (p) = cov{e d *?} for eJJ defined in ©. Similar to @J-gD, we can rewrite (l43l as 
1 M f "I 

= ^E A » E lo s det [ 7 + p c d} (p) v {wZ)r<i v (JafhZ) H ] 1 42. =£i\- (44) 
i=\ v j, 

When jC t ^ e = £j, Lemma [2] specifies that there exists some constant C such that E{||/j,^ p j|| 2 } < 
Cp^ 1 for all p. In this case, the eigenvalues of Cd.i(p) will be positive and bounded from above 
for all p, and thus eigenvalues of C^\{p) will be positive and bounded from below for all p. Thus, 
using a standard high-SNR analysis (see, e.g., fT8l for details), linip^oo -j^gy = 1 — ^ for any i, 
from which the stated result of this lemma follows. ■ 
In Q, it has been shown that, for L-length non-sparse channels, PAT can be designed to achieve 
data rates that satisfy linip^oo = 1 — ^, for P > L. Our Lemma [3] can be interpreted as an 
extension of the result from Q to L-length 5-sparse channels with known support. 

IV. Channel-Support Decoding 

In summary, the PAT scheme of Section MI- A I and the decoupled decoder of Section IIII-CI will 
suffice for spectral efficient communication over the sparse frequency-selective block-fading channel 
if we can establish a reliable means of determining the correct support (i.e., i such that Ci = Arue)- 
In this section, we consider schemes for reliably decoding the channel support of each block. 

A. Data-Aided Support Decoding 

In this section, we show that, with prime N, the pilot aided transmission (PAT) scheme defined 
in Section IIII-AI is spectrally efficient for the sparse frequency-selective block-fading channel. In 
other words, when the L-length channel is 5-sparse, it is sufficient to sacrifice only S signal-space 
dimensions to maintain an achievable rate that grows at the same rate as channel capacity in the 
high-SNR regime. To show this, we construct a so-called data-aided support decoder (DASD) that 
leverages certain error-detecting capabilities in the codebook £. We first describe the error detection 
mechanism and later propose a procedure for channel support decoding. 

In our DASD scheme, we attach error detection parity bits, which we refer to as cyclic redundancy 
check (CRC) bits, to the information bits prior to the channel-coding operation. Attaching parity bits 
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to the information bits is a commonly used mechanism to identify the decoding errors at the receiver 
|fl9l . Let us denote the information bit rate as R, and the CRC bit rate as 5, both in units of bits- 
per-channel-use. Then, over m = KN channel uses, we use a totals of mR bits for information and 
a total of md bits for CRC. Let //(■) denote the function which specifies the md parity bits for every 
set of mR information bits. Specifically, /i : {1, . . . , 2 mR } -»• {1, . . . , 2 mS } is a "binning function" 
mapping information bits to corresponding CRC bits, so that, for the information message w, the 
corresponding CRC bits are u = fi(w). Such u is sometimes referred to as the "auxiliary check 
message." The channel-encoder then maps the "composite message" (w,u), containing m(R + 5) 
bits, to one of the 2 m ^ R+ ^ codewords in the codebook <£. (See Section IIII-AI for details on the 
codebook.) For clarity, we use "message" when referring to channel-coder inputs, and "codeword" 
when referring to channel-coder outputs. 

The DASD support decoding procedure is defined as follows. 

For each hypothesis of support index i = . . . , ix) G {1, . . . , M} K , 

1) Compute conditional channel estimates {h^ ik }^ =1 and {^i,p.i k }k=i using (fl7l)-(fT8T) 
with h$% k = VNF lk h [ n % Ak and E, iPiifc = NF^^F^. 

2) Compute the WMD codeword estimate x$ t i according to (J24j). 

3) From the codeword x^^, recover the corresponding composite message (ibi,Ui). 

4) Perform error detection on (wi,Ui), i.e., check if fi(wi) / ttj. 

5) If no error is detected or there are no more hypotheses to consider, stop and declare 
the decoded message as Wi, else continue with the next hypothesis i. 

The asymptotic performance of DASD is characterized by the following theorem. 

Theorem 2. For the S -sparse frequency-selective N -block-fading channel with prime N, the previ- 
ously defined PAT scheme, when used with S pilots and DASD, yields an achievable rate 7Z DASD (p) 
that obeys lim^oo — t ^' = 1 — jj. Hence, PAT is spectrally efficient for this channel. 

Proof: In our proof, instead of considering a specific binning function /i(-), we consider 
the error performance averaged over all possible random binning assignments and establish that 
the average error approaches zero. For a given support hypothesis Ci, the DASD computes the 

9 For ease of presentation, we have ignored the flooring [mR] and [rnS] and the flooring error can be made negligible 
by choosing a large m. 
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support-conditional channel estimate and the corresponding WMD codeword estimate from which 
the composite message bits are obtained, which we write as (wi,Ui). There are two situations 
under which the DASD terminates, producing the final estimate ii; DASD = Wi. i) when i ^ i| ast and 
\i{wi) = Ui, or ii) when i = i| ast . Here we use i| as t to denote the last of the M K hypotheses. Note 
that, in all other cases, an error is detected, and the DASD continues under a different hypothesis 

A'- 

We now upper bound the probability that the DASD infers the wrong information bits, i.e., that 
ti,DASD ^ w. Say that i s t op denotes the value of i used to produce w DASD , i.e., w DASD = tUi stop . 
Notice that either 1) i st0 p = itrue or 2) i s top ^ %ue- in the latter case, the support detector fails 
to detect the true support when either 2a) i stop ^ i\ ast and fJ>(wi stop ) = Ui sXop , where the error was 
missed, or 2b) i st0 p = i| as t- Finally, notice that, if event 2b occurs, the DASD must have (falsely) 
detected an error under the true support hypothesis, i.e., fJ>(wi Xme ) / Wj true . Thus we can partition the 
error event ii>.j stop ^ w into three mutually exclusive events: 
El) istop = %ue and w istop / w, 

E2) i st0 p = iiast + %ue and both fi{w itrue ) / -u itrue and w istop / w. 

E3) 3i stop i {i t rue,i|ast} s.t. both fJ,(w istop ) = u istop and w istop / w. 
We now analyze each of these three events. 

Notice that El is the event of a data-decoding error under the correct support hypothesis (i.e., 
Wi Xtus ^ w). We recall that the correct-support-hypothesis case was analyzed in Section ITlI-C I under 
which PAT with decoupled decoding was found to be spectrally efficient, having an achievable rate 
1Z that obeys linip^oo = 1 — j^. Thus, the probability of El can be made arbitrarily small for 
any rates R and 5 such that R + 5 < 1Z. 

E2 characterizes the event in which the true support is falsely discarded and data-decoding error 
results later (under an incorrect support hypothesis). Recall that, when the support hypothesis is 
incorrect, we cannot guarantee a low probability of data-decoding error when communicating at 
rates that scale as (1 — -^)logp. The key, then, is to make the support-error probability small. 
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Towards this aim, we bound E2 as follows: 

Pr{E2} = Pr{fi(w ixiue ) / u iliue and w isXop + w} (45) 

< MM™w)^< ue } (46) 
= Pi{fi(w itr J ^ u itrue | w itiue = w} Pr{w itrue = w} 

+ Pr{/i(w itrue ) / w itrue | ™ itrue / w} Pv{w itrue / u>} (47) 

< Pi{fi{w itr J ^ u itrue | w itrue = ™} + Pr{w itrue / «>} (48) 
= Pi{u^u iti J + Pv{w iliue ^w}. (49) 

Thus, the probability of E2 can be upper bounded by the probability of decoding error under the 
correct support-hypothesis, which (like Pr{El}) can be made arbitrarily small for any achievable 
rate. 

E3 describes the event that both the detection of a support-error is missed and a data-decoding 
error results. Like with E2, the probability of data-decoding cannot be made arbitrarily small under 
an incorrect support hypothesis, and so we hope that the false alarm error is small. Towards this 
aim, we begin by upper bounding the probability of the event E3 as follows: 

Pr{E3} 

= Pr {3 i stop i {i true , i| as t} s.t. n{w istop ) = u istop \ w^ op / w} Pr{w istop / w} (50) 

< Pr{3 i <£ {itrue, *last} s.t. = iii \wi / w) (51) 

< Pr{3 i / % ue s.t. fi(wi) = Ui \ Wi / w} (52) 

< ^2 Pr{/ i ('" , i) = Ui\wi ^ w} (53) 
where we used the union bound in (l53l) . Now, to find the probability of missing a support-error, 



we assume that, when Wj / w, the auxiliary check estimate n{wi) is uniformly distributed over 
all possibilities of u. This can be justified by letting the function be constructed by a random 
binning assignment of the codewords onto 2 mS bins, and averaging over the ensemble of random 
binning assignments |[20l . In this case, for any i ^ i^ue, the probability of missing the detection of 
a support-error becomes 

Pr{fi(wi) = ^ i true ,Wi ^ w} = (54) 
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so that 

M K M K ( M \ K 
Pr{E3} < = = [^) ■ (55) 

So, when 5 > loi L , by choosing if large enough, we can make Pr{E3} averaged over all the 
random binning CRC assignments arbitrarily small. This implies that there exists a binning function 
p for which Pr{E3} can be made arbitrarily small. 

Notice that the rate 5 sacrificed to make Pr{E3} arbitrarily small does not grow with SNR p. 
As long as we choose the SNR-dependent information rate R{p) < Tl{p) — 5, where TZ(p) is an 
achievable rate for the sparse channel with known support described in Lemma [3j we can construct a 
codebook that guarantees arbitrarily small values for Pr{El} + Pr{E2}. This codebook, when used 
in conjunction with the binning function p, ensures that Pr{ii; DASD ^ w} = Pr{El} + Pr{E2} + 
Pr{E3} can be made arbitrarily small. Since 5 is fixed with respect to SNR p, the information rate 



of DASD satisfies lim^ gg = 1 - f . 



tj0« 



As we have seen, the DASD achieves the optimal pre-log factor, albeit at complexity 0(\£\M K + 
\£\MKN 2 ), which may be larger than that of the optimal decoder specified in Lemma [TJ In fact, we 
do not propose DASD for practical use, but rather as a constructive means of proving the achievability 
of the optimal pre-log factor, since the optimal decoder is difficult to analyze directly. In the next 
section, we present a simpler suboptimal decoding scheme that also has performance guarantees. 

B. Pilot-Aided Support Decoding 

In this section, we propose a pilot-aided support decoder (PASD) with complexity^ 0(\£\KN 2 + 
KMP 2 ), which is significantly less complex than both DASD and the optimal decoder in Lemma Q] 
Since only pilots are used to infer the channel support, the complexity of support estimation grows 
linearly in K. PASD, however, requires one additional pilot dimension relative to DASD (i.e., P = 
5 + 1) and is only asymptotically reliable (i.e., the probability of support-detection error vanishes 



10 Note that the term to the right of the sum in the WMD decoder metric d24t must be computed for every triple 
(i, k, aSj ), where the complexity of each computation is 0(N 2 ). Subsequently, these terms must be summed for each 
of M K support-vector hypotheses. 

11 As described below, for support estimation, K instances of ip fe ' must be computed, each with complexity 0(MP 2 ). 
Then, for (support-conditional) WMD decoding, |£|A" instances of the term after the sum in ( 124b must be computed, each 
with complexity 0(N 2 ). 
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as p — > oo but is not guaranteed to be arbitrarily small at any finite p) unless the channel support 
is fixed over fading blocks k € {1, . . . , K}. 
1 ) Pilot-Aided Support Estimation: We now present an asymptotically reliable method to infer 
the channel support C that requires only P = S + 1 pilots per fading block. For this, we use the 
following normalized pilot observations: 

(fc) A 1 v( *>, (fc) _ _,(*) ,(fc) 1 (fc) (56) 

P ~~ ^7pff \ P'UP ~ p,true'*nz ^ ^N U P > ^ JO ' 

where i/p ~ CAA(0, J) due to the constant-modulus assumption on the pilots. Recalling that F^ rue 
is constructed from rows A/p and columns of F, and that F Pi i is constructed from rows Ap 
and columns d of F, we henceforth use n p j = F Pi i(F pi F p j)~ 1 F pi to denote the matrix that 
projects onto the column space of Fpj, and EE^ = I — H P: i to denote its orthogonal complement. 

The pilot-aided support estimator (PASE) infers the support index as that which minimizes the 
energy of the projection error e^J: 

Jk) A ■ II (fc)||2 f (fc) A TT ± (fc) /C -, N 

^p = argmin ||e^/|| for e\( = U pi z p (57) 

ie{i,...,M} 

Clearly, the complexity of PASE is proportional to M = (g) = 0((L/S) s ). Thus, while the 
complexity of PASE is much less than the DASD proposed in Section IIV-AI we note that its 
complexity may be significantly larger than classical compressive sensing algorithms like basis 
pursuit, whose complexity is polynomial in L J2T|. 

Theorem 3. For the S-sparse frequency-selective N -block-fading channel with prime N, and the 
previously defined PAT scheme with P > S + 1 arbitrarily placed pilots, the probability of PASE 
support-detection error vanishes as p — > oo. 

Proof: We first note that, due to the Chebotarev theorem (HI, El, each F Pii <E C PxS is full 
rank when N is prime and P > S + 1. Also, each column / of F p ^ is linearly independent of all 
columns in F p j\^. that are not equal to /. Thus, each F p ^ defines a unique column space. We 
note that this property does not hold when P = S. 

(k) (k) no (fc) o 

A PASE support-detection error results when 3i ^ i true s.t. \\e pi \\ < ||e ptrue || . The probability 
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of this event can be upper bounded as follows, 

p r ( 3i ^ M t || e ( fc )||2 < II (*) ||21 
ri p( 7= « true s.i. 1 1 *=p,» II ^ ll e p,truell / 

"I 



< E Pr l 

l Ari,e 

= EM 

l Arue 

< EM 
= EM 

•_i -co 

l Ari,e 

< EM 

l Arue 



P W||2 ^ || p (fc) 
p,i II ^ II p.truel 



TT_L Tp(k 
± - l p,'t- r p,tl 



11 p,» r p,tl 



TT-L p(k 
AA p,» p,tl 

11 p,» r p,tl 



nz ^ ^JV P.* P 11 ^ ^SV 11 P.true^p 11/ 

II v /^7v 11 p,J ty p II ^ V ^]vll 11 p,true i/ P 11/ 



/i (fc) 
ue" , nz 



/, (fc) ll < - 



1 "n^ fc) ll + 1 



in 



(fe)i- „(*) 



pjV 11 P.true 



/i (fc) n < -2-1 

ue^nz II <• ^ 



^11}, 



(58) 
(59) 
(60) 
(61) 
(62) 



where the probability of error in d60l ) was upper-bounded by making the left side of the inequality 
smaller via — ||y|| < \\x + y\\. The upper bound (l62l) follows from ||IIp^i/p^|| < H^p || and 
lln^^i/p || < Hfp^H, which hold because 11^ and n^)^ are projection matrices. Taking the 



SVD ILK-.F 



(k) 



and defining gf ) = y/SVf^h^J ~ CJ\f(0, 1), we can rewrite 
62l as follows and upper bound further: 



p fzr _J_ -( k ) t II ( fc )||2 , || (k) ii2\ 
rr \ zlz Vue s - t - H e p,i II < H e p,truell J 

< E Pr {n^ (fc) n 2 <f K fc) n 2 } 

*Arue 



. , .(fc) 
l ^ l true 



l t=hiue 



E Pr 

.(fc) 

l ^ l true 



Ifl (fc) | 2 

\yj,o i 

,(*)||2 

p i 



< 



45 



(min)x 2 



(.0 



(63) 
(64) 
(65) 

(66) 



r (fc) 



Above, cJj-q^ denotes the largest singular value in S-^ and cr^T = min^ a^J . Notice that at least 

(k) 

one of the columns of F p ^ rue lies outside the column space of F p j. The projection of those columns 
onto the subspace orthogonal to the column space of F Pi i will be non-zero implying that lip jF ptrue 
is not identical to and hence the largest sing ular value af$ > O.Vfc. Since ~ CAf (0,1) 



is independent of v p ~ CAf(0,I), the random variable F i 



(k) A ■ (*) 12/11 ,.(*) ||2 



\9i,6 



is F-distributed 



with parameters (2, 2P). Since the cumulative distribution function (cdf) of an F-distributed random 
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variable vanishes as its argument (in this case, - (mi tf^ 2 N ) approaches zero, the probability of a 
PASE error vanishes as p — > oo. ■ 

We now make a few comments about Theorem [3] To perfectly recover any arbitrary deterministic 
S*-sparse impulse response from noise-free frequency-domain samples, ifTOl established that 2S pilot 
tones are both necessary and sufficient. In contrast, to perfectly recover an S'-sparse probabilistic 
Rayleigh-fading impulse response, Theorem [3] establishes that S + 1 noise-free pilot observations 
suffice with probability one. In particular, the condition P > S + 1 ensures that the set of that 
cannot be recovered by the PASE support detector has probability with respect to the Gaussian 
distribution on the nonzero entries of To see this, notice that rank^i^j) = rank(_Fpj) = S, 
but also that range(i r P i) = range(.Fpj) only if i = j. In particular, if i ^ j, then dim{range(i r Pj j) n 
range(i r p J )} = S — 1. This implies that the set of vectors h nz G C s for which F Pi ih nz is in the 
range space of F p j has measure zero with respect to any continuous distribution on h nz . Similar 
results on the recovery of probabilistic sparse signals have also appeared in 11221 . 

2) Pilot-Aided Support Decoding: For pilot-aided support decoding, we assume that the transmit- 
ter uses the PAT scheme defined in Section ITlI- Al with P = S+l pilots and prime N. At the receiver, 
the PASE scheme described in the previous section is used to estimate the sparse channel support 
and, based on this estimate, support-conditional channel estimation and decoupled data decoding are 
performed as described in Section IIII-CI 

We now study the e-achievable rate of PAT with PASD. For some e > and SNR p, let !Z e {p) 
denote the information rate for which the probability of decoding error can be made less than e. 
Lemma |4] characterizes lZ € (p) for PAT with PASD. 

Lemma 4. For the S-sparse frequency-selective N -block-fading channel with prime N, the pre- 
viously defined PAT scheme, when used with S + 1 pilots and PASD, yields an e-achievable rate 
K™ SD that, for any e > 0, obeys lim^oo n \^ p} =1-^1. 

Proof: From Theorem [3] we know that, under the conditions stated in the lemma, there exists, 
for any e > 0, an SNR p e above which the error of PASE is less than e/2. In the case that the 
support hypothesis is correct, the channel estimation and decoupled decoding of Section IIII-CI allow 
for the design of a codebook € p>e that guarantees data decoding with error probability less than e/2 
at SNR p. Furthermore, from Lemma [3l this codebook can be designed with a rate R e (p) such that 
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linip^oo ^g P p = 1 — ^jj^- Putting these together, we obtain the result of the lemma. ■ 
We note that, for any given finite SNR p, it is not possible to make e, the PASD error probability, 

arbitrarily small. Thus, the achievable rate TZ(p) of PAT with PASD equals zero for any finite p. 

This behavior contrasts that of PAT with DASD, which had positive achievable rate for all p > 0. 
Recall that, with the sparse block-fading channel model assumed throughout the paper, the channel 

support £( fe ) changes independently over fading blocks k. We now consider a variation of this channel 

for which the support does not change^ over k. For this fixed-support channel, it is possible to modify 

PASE so that it recovers the support C with an arbitrarily small probability of error at any SNR 

p > 0, leading to the following corollary of Lemma |4] 

Corollary 3. For the S-sparse frequency-selective N -block-fading channel with prime N and a 
support {£-( k '}k=i tnat JS constant over the fading block index k, the previously defined PAT scheme, 
when used with 5+1 pilots and PASD, yields an achievable rate 1Z u that obeys linip^oo — t ^' = 
1 _ 5+1 

Proof: For this channel, we use PASE with the metric Ylk=i H e pi II 2 ^ n pl ace °f the metric 



e p ill 2 f rom d57l ). With this modification, we obtain an error probability upper-bound analogous to 



(*)||2 

66]), but where the F-distributed random variable has parameters (2K, 2K(S + 1)). In particular, 



{K K \ 

3^ %ue s.t. ^||eg|| 2 <^||eg ue || 2 
fe=i fc=i J 



^ E Pr ( 5 ^IkL < ; (minh 2 (67) 

^ tme [Ek=i\K II 2 (<o ') 2 pN) 

For an F-distributed random variable with parameters (2K,2K(S + 1)), the value of the cdf at 
any fixed point decreases with K. Thus, by choosing a suitably large K, we can make the PASE 
support-detection error arbitrarily small at any SNR p > 0. The result of this lemma then follows 
from Lemma [3] ■ 

V. Conclusion 

In this paper, we considered the problem of communicating reliably over frequency-selective 
block-fading channels whose impulse responses are sparse and whose realizations are unknown to 

12 Although the support remains fixed over fe, the nonzero channel taps /i„z' still vary independently over k. 
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both transmitter and receiver, but whose statistics are known. In particular, we considered discrete- 
time channel impulse responses with length L and sparsity exactly 5 < L, whose support and 
coefficients remain fixed over blocks of TV > L channel uses but change independently from block 
to block. 

Assuming that the non-zero coefficients and noise are both Gaussian, we first established that the 
ergodic noncoherent channel capacity C S parse(p) obeys lim^oo Cs ^ e ^ = 1 — -j| for any L. Then, 
we shifted our focus to pilot-aided transmission (PAT), where we constructed a PAT scheme and a so- 
called data-aided support decoder (DASD) that together enable communication with arbitrarily small 
error probability using only 5 pilots per fading block. Furthermore, we showed that the achievable 
rate lZ DASD (p) of this pair exhibits the optimal pre-log factor, i.e., lim p _ 5 . 00 ^ log ^ = 1 — j^. The 
use of 5 pilots can be contrasted with "compressed OFDM channel sensing, " for which 0(5 In 5 N) 
pilots are known to suffice for accurate channel estimation (with high probability) in the presence 
of noise, and for which 25 pilots are known to be necessary and sufficient for perfect channel 
estimation in the absence of noise. 

Due to the complexity of DASD, we also proposed a simpler pilot-aided support decoder (PASD) 
that requires only 5 + 1 pilots per fading block. For PASD, the e-achievable rate 7Z™ SD (p) obeys, for 

Tj PASD ci q,i 

any e > 0, lim^oo { K y = 1— with the previously considered channel, and its achievable rate 

PACin / \ 7? PASD /rt^J c_i_i 

7Z (p) obeys linip^oo log y = 1 — when the sparsity pattern of the block-fading channel 
remains fixed over fading blocks. We note that, in recent work ifTTTl . lfl2l . the authors have proposed 
a loopy belief propagation based joint channel estimation and decoding scheme, with complexity 
O(KLN), that shows empirical performance that matches the anticipated pre-log factor of 1 — 

The results of this work are only a first step towards the understanding of reliable communication 
over sparse channels. Important open questions concern rigorous analyses of the cases that i) the 
inactive channel taps are not exactly zero-valued, ii) the channel has at most (rather than exactly) 5 
active taps, iii) the receiver does not know the channel statistics, iv) the channel taps are correlated 
within and/or across blocks, and/or v) the channel taps are non-Gaussian. 
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Appendix A 
Proof of Lemma [T] 

Proof: The maximum a posteriori (MAP) codeword estimate is defined as 

xT P = a rgm8xp{x d \{yW}K =1 ,{yP}f =1 ,x p ) (68) 
= avgmaxp({y ( d k) }^ =1 \x d ,{y i p k) }^ =1 ,Xp)p(x (i ) (69) 

where d69l ) results after applying Bayes rule and simplifying. Assuming that codewords are uniformly 
distributed over £, the MAP codeword estimate reduces to the maximum likelihood estimate 



x^ = argmaxp^yW^l^^.fyW}^,^) (70) 
x a e<£ 



K M 

arg max J] £ Pr{£« = | a; ^ , y ^ , * p } 



x 



, I * d w , v?\ *p, *SU W = A)p(^ z } I *?\ y?°, *p, = A) (71) 

ft nz 

A' M 

= argmax JJ J^Pr{£ (fe) = d\yf\x 9 ) 

Xdee - k=i i=i 

x / plll^^ifiUW^MtflrfU,^^), (72) 
Jhg> 

where the decoupling in (1711) is due to independent fading and noise across fading-blocks. Recalling 
that, under the hypothesis = Li, the pilot observations become 

vf = VpNV(x p )F p ,h^ +vf\ (73) 

with p(h$ | = A) = CM{h { ^- 0, S" 1 J), the posterior I Vp\ «p, £ W = A) is Gaus- 

sian. In particular, 

p(^ ) |yf ) , a! p,£( fe )=A) = CJV(fc£ ) ;hS! P)i ,Sn 2 ,p,0 J ( 74 ) 

A (fc) 

where ft nz p i can be recognized as the /^-conditional pilot-aided MMSE estimate of h^ z and S n z,p,j 
as its error covariance: 

Cm = EK^Iv?Up.£ (fc) =A} (75) 

Snz, P ,i = cov^lyf^p,^) = £i}. (76) 
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Due to the linear Gaussian model (1731 ). the MMSE estimate /i nzp j is a linear function of y p : 

^ p>i = E{^ p fe ^^^ (77) 
= ^F p H ,P(^)(pf P( :Ep )F p ,F p H ,P( a; ;) + /)- 1 y( fc) (78) 

= VJ F Up F p.K* + py lv ( x *p)yp k) > ( 79 ) 

where, for d79l ), we exploited the fact that x p has constant-modulus elements. Similarly, 

x E {y p k) y p k)H | asp, = A}" 1 E {y p fc) ^ z )H | z p , £« = £,} (80) 
= 1/ - . 2?(a;;) ©(a^i^i^ P(^) + /) ~ X Z^p)*^ (81) 

Finally, since both pdfs in (TTIl ) are Gaussian, the integral can be evaluated in closed form, reducing 
to (see, e.g., ||23l) 

/ p{yf | xM,h&\£M = bMh® | y p k \x p , £<*> = C t ) 

= Cdet (pNF^Vtxf xf*)F d>i + S^)" 1 

x exp ( - ||yW - ^V{xf)F^{xf)f - || - t^J^ J ,(83) 

where C does not depend on x$, and where fi^ zi (x^) denotes the MMSE estimate of 
conditioned on the data hypothesis x^ and based on the pilot-aided prior statistics d74l ): 

x (y^ - ^p~NV{xf)F 6 ,h [ S^). (84) 

■ 
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3/ (fc W fc) 


observation vector in time domain, in frequency domain 




channel vector in time domain, in frequency domain 


x {k) ,x\ k) 


data vector in time domain, in frequency domain 


v^ k \v\ k) 

Ik) (fe) 

y P ,Vi 

™W -CO 
(k) (fe) 


AWGN vector in time domain, in frequency domain 
pilot, data portions of in frequency-domain observation vector 
pilot, data portions of in frequency-domain data vector 
pilot, data portions of in frequency-domain noise vector 


A/" p , A/" d 


pilot, data subcarrier index sets 


h (k) 


non-zero portion of time-domain channel vector 


c w 


set of channel-support indices for k th block 


d 


set of channel-support indices for i th hypothesis 


p (fc) 
p,true 


unitary DFT matrix restricted to true columns and rows Ap 


Fj 


unitary DFT matrix restricted to columns d 


Fp,j 


unitary DFT matrix restricted to pilot rows Ap and columns d 


-(fc) ~(fe) 

"Hp,i3 "f,p,i 
**nz,p,i 7 **nz,p,£ 


unitary DFT matrix restricted to data rows Ad and columns d 

(fe) 

d -conditional pilot-based MMSE estimate of h f , associated error 

(fe) 

d -conditional pilot-based MMSE estimate of h m , associated error 


d,z 


d -conditional effective noise on for WMD decoding 


(fe) 
Z P 


normalized pilot observations used for PASE 


(fe) 
^P 


normalized AWGN on z {k) used for PASE 


„(*0 

P.i 


d -conditional projection error vector used for PASE 


TABLE I 



Review of commonly used variables, where (-) (fe) denotes dependence on k th fading block. 
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